Introduction

This tutorial provides a brief introduction into mapping data using R. My PhD work focuses on regional dialects, which means we’ll be working with regional data today. We want to map features to help us understand regional distribution of language varieties, but we could be interested in many other features that are distributed regionally, for example social variables like income or access to education. Mapping these things can be useful for linguistics, but it is quite well known in areas like geography or ecology as well.

First Steps

In preparation for our maps, we’ll need to load a couple of packages. For mapping we’ll need ‘maps’ and for some optional maps ‘rworldmap’, which also loads ‘sp’. Depending on your version of R you might also need ‘broom’. The main work for the maps will be done using ggplot, so we need the tidyverse package as well.

library(maps)  # to get US maps
library(rnaturalearth)  # mapping other country outlines
library(rnaturalearthhires)  # high resolution
library(tidyverse)  # making pretty maps
library(sf)  # to change the geo-information to suitable format

Data

The data we’ll be using today is based on a collection of 1 billion Tweets / 9 billion words. All Tweets are geocoded American Tweets collected between 2013 and 2014. From this a US Twitter swearing data set was compiled. See Huang et al. 2016; and Grieve et al. 2017 for more information.

The initial step now is reading in the data set, which is located in my Github.

norm_swear <- read.table("https://raw.githubusercontent.com/danaroemling/mapping/main/r_ladies_april23/MAPPING_SWEARING.csv",
    header = TRUE, sep = ",")

The basic dimension of the data set is 52 swear words measured across 3,076 locations, denoted by state plus county (= 53).

dim(norm_swear)
## [1] 3076   53

The locations are coded as state-county pairs. These are the first 10 rows of our data set.

head(norm_swear, 10)
##              county     ass asshole bastard   bitch bitched bitchy bloody
## 1   alabama,autauga 1520421   49600    9538  962106    6995   8903   5087
## 2   alabama,baldwin 1246775   54318    6578  807348    2334   7851  14004
## 3   alabama,barbour 2263661   29188    3243  959948    3243   6486   3243
## 4      alabama,bibb 1451192   14629    2926 1009398       0   8777      0
## 5    alabama,blount  559433   72969    4230  506556    2115   5288   3173
## 6   alabama,bullock 2168413   56605       0 1184354       0   8708      0
## 7    alabama,butler 2638306   38680   11282 1806683    6447   4835   3223
## 8   alabama,calhoun 1604872   38763    8012  917534    2166   5197   4115
## 9  alabama,chambers 1881425   34756    5902 1438120    1312   1967  20329
## 10 alabama,cherokee  380377   37028    1683  272660    1683   6732   5049
##    bullshit  cock   crap crappy  cunt    damn damnit damned  darn   dick
## 1    120184 15897 146255  13354 22892 1206925  19077   8903 13990 210481
## 2     98452  7002 109910  10397  9124  907073   9760  10185 10185 113729
## 3     74591  3243 113507  19458  3243 1258310      0  25945 22701 136209
## 4    105328     0  90699   8777  2926 1176168  17555   2926 17555  93625
## 5    101523  9518 201988   6345  9518  469543  13748  23266  8460  59222
## 6    182878  8708  21771   4354     0 1240960      0      0  4354 300443
## 7    164390  9670  46738   4835     0 1513359   3223  11282 14505 267537
## 8     95500  8446  74711   6713 10395 1027543  14292  14292 12344 147689
## 9    155419  7869  55741   3935  8525 1080065  13116   8525 10492 172469
## 10    40394  1683 121182  15148  3366  336617   5049  11782  6732  38711
##    dickhead douche douchebag dumbass  dyke   fag faggot fatass freaking friggin
## 1      3179  14626      6359   43241  2544 42605  40697   6359   167876    2544
## 2      2971  18884      5729   29069  2546 19521  15489   4031   170593    2546
## 3         0   3243      3243   22701     0  9729      0      0   175126    3243
## 4      2926   5852         0   11703  2926  2926   8777      0   187251    8777
## 5      9518  13748      4230   25381     0 16920  32783   4230   195643    5288
## 6         0   4354         0   52251 17417 13063  52251      0    47897       0
## 7         0   4835      1612   29010     0  6447  12893   1612    78972    1612
## 8       650   7363      2815   18840  3898 11044   9312   3032   104378    2599
## 9      1967    656         0   26231     0  9181   9837   8525   127221    1312
## 10     1683   6732      1683   15148     0  1683  10099      0   107717   13465
##       fuck fucked fucker fuckery fucking goddamn   gosh   hell    hoe  homo
## 1  1441570 212388  21620    6359  592017    5087  69948 695667 268347 17169
## 2  1137714 139191  15065    2546  462767    5941  81690 573101 252920  7426
## 3  1115615 158910  12972    3243  376196    9729  64861 901573 369710  3243
## 4  1351715 236989  20481   17555  833850    8777  38035 506162 298431  2926
## 5   775168  88832  12690    3173  319374    4230 132191 379653 131134  1058
## 6  1941993 278672  21771    8708  574760   13063  30480 735867 335277     0
## 7  2427177 328781  17728   14505  676902   27398  48350 862244 515735  1612
## 8  1305163 188184  12127    8879  401056    7363  52189 747323 293645  6063
## 9  1800109 220997  17706    5246  445273   15739  58364 908252 398713  3935
## 10  464532  50493  33662    5049  323152    1683  42077 373645  69006  5049
##    jackass motherfucker motherfucking nigger   piss pissed pissy  pussy    shit
## 1    13354        12082          3179   5087  69948 169148 11446 197763 2352169
## 2     4668         4456          2546   2971  71081 152770  4456 103969 1733094
## 3        0        19458          3243   3243  64861 139452  6486 204313 2085293
## 4        0         5852          5852      0  67293 201880     0 152141 2390371
## 5     2115         7403          3173      0 102580 155457 10575  32783  905244
## 6     4354        30480             0  13063  65314 117565  4354 296089 3239557
## 7        0         9670          4835      0  78972 262702 11282 269149 3932477
## 8     1083         7579          7146   3032  57170 159816  4331 161332 2190864
## 9     3279        21641         11148   1967  53774 172469  2623 255097 2863124
## 10   10099            0             0      0  42077  72373  1683  26929  540270
##    shittiest shitty  slut slutty twat whore
## 1       1908  52779 37518   3179 3179 46420
## 2       3395  41163 31615   5305 3819 32676
## 3       9729  22701 32431      0    0 25945
## 4          0  20481 29258      0    0 29258
## 5       2115  43359 35956   4230 2115 45474
## 6       4354  65314 21771   4354    0 13063
## 7          0  35457 19340   6447 4835 24175
## 8        866  25337 19273   3248 1083 20573
## 9        656   8525 16394   1967 1312 28198
## 10      1683  18514 10099   1683 6732 38711

For each county, Grieve et al. (2017) measured the relative frequency per billion words of the word in all the Tweets originating from that county by dividing the frequency of that word in those Tweets by the total number of words in those Tweets and multiplying the product by 1 billion. These swear words are all in the top 10,000 most frequent word types in the corpus. Here is a summary of the swear words.

summary(norm_swear[, 2:ncol(norm_swear)])
##       ass             asshole          bastard           bitch        
##  Min.   :      0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.: 633648   1st Qu.: 42586   1st Qu.:  4957   1st Qu.: 522379  
##  Median : 861798   Median : 63874   Median :  9382   Median : 727158  
##  Mean   :1017504   Mean   : 67861   Mean   : 11108   Mean   : 790053  
##  3rd Qu.:1265534   3rd Qu.: 86399   3rd Qu.: 13992   3rd Qu.: 996426  
##  Max.   :8904228   Max.   :567215   Max.   :310376   Max.   :7340226  
##     bitched           bitchy           bloody          bullshit     
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:  2677   1st Qu.:  3595   1st Qu.: 84135  
##  Median :  3382   Median :  6802   Median :  8779   Median :111614  
##  Mean   :  4898   Mean   :  8686   Mean   : 11871   Mean   :113866  
##  3rd Qu.:  6328   3rd Qu.: 11153   3rd Qu.: 14800   3rd Qu.:139169  
##  Max.   :508411   Max.   :283607   Max.   :591876   Max.   :714967  
##       cock              crap            crappy            cunt       
##  Min.   :      0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:   5280   1st Qu.: 60728   1st Qu.:  4996   1st Qu.:  7365  
##  Median :  11406   Median : 88146   Median :  9905   Median : 17118  
##  Mean   :  14392   Mean   : 98504   Mean   : 11993   Mean   : 21028  
##  3rd Qu.:  17621   3rd Qu.:124366   3rd Qu.: 14921   3rd Qu.: 28560  
##  Max.   :1242999   Max.   :821355   Max.   :244499   Max.   :435954  
##       damn             damnit           damned            darn       
##  Min.   :      0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.: 578292   1st Qu.:  6902   1st Qu.:  4680   1st Qu.:  8764  
##  Median : 742200   Median : 14019   Median :  9002   Median : 13997  
##  Mean   : 794290   Mean   : 17008   Mean   : 11536   Mean   : 17030  
##  3rd Qu.: 944634   3rd Qu.: 22584   3rd Qu.: 14364   3rd Qu.: 20611  
##  Max.   :3846951   Max.   :368363   Max.   :235349   Max.   :263481  
##       dick            dickhead           douche         douchebag     
##  Min.   :      0   Min.   :    0.0   Min.   :     0   Min.   :     0  
##  1st Qu.: 106660   1st Qu.:    0.0   1st Qu.: 10964   1st Qu.:     0  
##  Median : 152443   Median :  828.5   Median : 20833   Median :  4616  
##  Mean   : 158922   Mean   : 2402.5   Mean   : 28204   Mean   :  7056  
##  3rd Qu.: 199590   3rd Qu.: 2873.2   3rd Qu.: 37840   3rd Qu.:  9268  
##  Max.   :1426300   Max.   :65772.0   Max.   :357483   Max.   :275330  
##     dumbass            dyke             fag             faggot      
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.: 16770   1st Qu.:     0   1st Qu.:  9904   1st Qu.:  9832  
##  Median : 25860   Median :  1272   Median : 18970   Median : 20571  
##  Mean   : 28604   Mean   :  2749   Mean   : 22751   Mean   : 25161  
##  3rd Qu.: 35503   3rd Qu.:  3755   3rd Qu.: 30006   3rd Qu.: 34156  
##  Max.   :301841   Max.   :133627   Max.   :301341   Max.   :308339  
##      fatass          freaking         friggin            fuck        
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.:     0   1st Qu.: 83326   1st Qu.:     0   1st Qu.: 982700  
##  Median :  2746   Median :118184   Median :  3146   Median :1392849  
##  Mean   :  3993   Mean   :129884   Mean   :  5199   Mean   :1429113  
##  3rd Qu.:  5230   3rd Qu.:162076   3rd Qu.:  6042   3rd Qu.:1835837  
##  Max.   :157093   Max.   :900328   Max.   :307630   Max.   :9527509  
##      fucked            fucker          fuckery          fucking       
##  Min.   :      0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.: 116515   1st Qu.: 12694   1st Qu.:     0   1st Qu.: 497703  
##  Median : 172967   Median : 21610   Median :  1760   Median : 724440  
##  Mean   : 177169   Mean   : 25477   Mean   :  3783   Mean   : 771299  
##  3rd Qu.: 231936   3rd Qu.: 32686   3rd Qu.:  5170   3rd Qu.: 991083  
##  Max.   :1133503   Max.   :261505   Max.   :277937   Max.   :4075971  
##     goddamn            gosh              hell              hoe         
##  Min.   :     0   Min.   :      0   Min.   :      0   Min.   :      0  
##  1st Qu.:  2482   1st Qu.:  47687   1st Qu.: 407059   1st Qu.:  64499  
##  Median : 10120   Median :  72236   Median : 498528   Median : 110291  
##  Mean   : 12649   Mean   :  82527   Mean   : 531404   Mean   : 155724  
##  3rd Qu.: 17096   3rd Qu.: 103684   3rd Qu.: 614383   3rd Qu.: 200468  
##  Max.   :231535   Max.   :2601908   Max.   :2770083   Max.   :1949566  
##       homo           jackass        motherfucker    motherfucking   
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:  4362   1st Qu.:     0  
##  Median :  6561   Median :  4375   Median : 10130   Median :  3643  
##  Mean   :  7823   Mean   :  5626   Mean   : 11719   Mean   :  4905  
##  3rd Qu.: 10304   3rd Qu.:  7211   3rd Qu.: 15396   3rd Qu.:  6450  
##  Max.   :276932   Max.   :154447   Max.   :382482   Max.   :236967  
##      nigger            piss            pissed           pissy       
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:     0   1st Qu.: 55314   1st Qu.:125086   1st Qu.:     0  
##  Median :  3022   Median : 71656   Median :160544   Median :  4476  
##  Mean   :  4694   Mean   : 75528   Mean   :168064   Mean   :  7008  
##  3rd Qu.:  6185   3rd Qu.: 91372   3rd Qu.:204570   3rd Qu.:  8866  
##  Max.   :295300   Max.   :475602   Max.   :747938   Max.   :293600  
##      pussy              shit            shittiest         shitty      
##  Min.   :      0   Min.   :       0   Min.   :    0   Min.   :     0  
##  1st Qu.:  59199   1st Qu.: 1207753   1st Qu.:    0   1st Qu.: 41925  
##  Median :  98144   Median : 1608105   Median : 2910   Median : 70316  
##  Mean   : 120961   Mean   : 1753575   Mean   : 4034   Mean   : 76213  
##  3rd Qu.: 154132   3rd Qu.: 2172223   3rd Qu.: 5878   3rd Qu.:102168  
##  Max.   :1488628   Max.   :12309084   Max.   :69425   Max.   :550661  
##       slut            slutty            twat            whore       
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.: 24016   1st Qu.:     0   1st Qu.:     0   1st Qu.: 25470  
##  Median : 38572   Median :  4735   Median :  3116   Median : 37280  
##  Mean   : 43301   Mean   :  5727   Mean   :  4640   Mean   : 41505  
##  3rd Qu.: 55780   3rd Qu.:  7934   3rd Qu.:  6112   3rd Qu.: 52613  
##  Max.   :547945   Max.   :150670   Max.   :130014   Max.   :547945

Mapping

Before we map our swearing data, we need to understand the basics of cartography in R.

Mapping the US

First, we need to get a map of the US, which we will format and use as a base to plot our swear word relative frequencies on to. There are several stages to setting up a nice map. Aside from the first step though, which just involves reading in the underlying map, they’re all optional.

Accessing Mapping Data in R

First, we need to get a US map. Fortunately working with US data is very easy in R, since all the necessary maps can be accessed either in library(maps) or through ggplot2. We use ggplot’s map_data function to extract the relevant information from the package.

usa <- map_data("usa")

Now we’ll have a look at the very basic US map. For this we’ll need ggplot2. For any ggplot to work, we need three basic components: data, aesthetic, and geoms. The data is the resource we use and want to visualise. The geometric objects (stuff like points, shapes, lines etc.) is the format in which the data gets put. We need the aesthetics to link between the data and the geometric objects (geoms), so that R knows how the data is visualised.

ggplot() + geom_polygon(data = usa, aes(x = long, y = lat, group = region))

Other countries

If you want to map other countries, you can download and read in the base mapping data (e.g. shapefiles), which are available from various different sources. This is especially interesting if you’re looking to work with administrative regions and the like. For country outlines, you can also use library(rnaturalearth) or library(rworldmap). This example below shows how to produce a map of Germany, Austria and Switzerland (= German-Speaking Area, GSA).

What this code chunk does is getting the world map and then creating a list of three countries by name. Then we create a map based on that list and in the next step we get the coordinates of those countries, so that we can use these for mapping.

gsa_outline <- ne_countries(country = c("Austria", "Germany", "Switzerland"), returnclass = "sf",
    scale = "large")

After this, we can have a look at our three countries using ggplot. The coord_fixed argument makes sure that the relationship between x and y is correct; it fixes the aspect ratio.

gsa <- ggplot(data = gsa_outline) + geom_sf()

gsa

Back to the US

Now, we need to make sure our US data can be mapped, which means we don’t just need the outline of the US, but we need the counties. We can extract them from our maps package.

counties <- map_data("county")
ggplot() + 
  geom_polygon(data = counties, 
               aes(x = long, y = lat, group = group),
               # to see the counties we add a colour for outline and filling
               color = "black", fill = "lightgrey", 
               linewidth = .1 )

Polishing our map

Now that we have a basic map of the US, we can make it look a bit nicer, so that subsequent maps are easier to read.

ggplot() + 
  geom_polygon(data = counties, 
               aes(x = long, y = lat, group = group),
               color = "black", fill = "white", 
               size = .1 ) +
  theme_minimal() +  # sets the theme for the plot
  ggtitle("US Map with Counties") + # gives the plot a title
  theme(axis.title.x = element_blank(), # removes x axis title, here longitude
        axis.title.y = element_blank(),# removes y axis title, here latitude
        axis.text.x = element_blank(), # removes x axis text, here coordinates
        axis.text.y = element_blank(), # removes y axis text, here coordinates
        panel.grid.major = element_blank(), # removes grid lines
        panel.grid.minor = element_blank(), # removes grid lines
        plot.title = element_text(hjust = 0.5)) # centres title
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.

## Warning: Please use `linewidth` instead.

Data Wrangling

Now that we have a base map and our data read in, we need to make sure the data can be mapped. This might look a bit complicated, but what we’re doing is getting the coordinate data that we need to join our existing data set.

First, we get a map of the counties (aka the geo-information we need) and save it as us_geo (and have a little look). For this we need the package ‘sf’. We’re still using the same “maps” library as before, but since each county has multiple sets of coordinates, we need a format that can be matched to our data set, where each location is just one row, hence we’re handling it with ‘sf’. We merge the two separate lists into one using dplyr.

us_geo <- st_as_sf(maps::map(database = "county", plot = FALSE, fill = TRUE))

ggplot(data = us_geo) + geom_sf()

us_geo_swear <- us_geo %>%
    left_join(norm_swear, by = c(ID = "county"))

If you have a look at the new data frame us_geo_swear, you can see that it is essentially the same list as before, but that the last column contains another list, as every county has multiple coordinate points, which we need for plotting.

# shows us that it is a data frame
class(us_geo_swear)
## [1] "sf"         "data.frame"
# you can see that we now have a data frame that contains multipolygons
head(us_geo_swear)
## Simple feature collection with 6 features and 53 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.01778 ymin: 30.24071 xmax: -85.06131 ymax: 34.2686
## Geodetic CRS:  WGS 84
##                ID     ass asshole bastard   bitch bitched bitchy bloody
## 1 alabama,autauga 1520421   49600    9538  962106    6995   8903   5087
## 2 alabama,baldwin 1246775   54318    6578  807348    2334   7851  14004
## 3 alabama,barbour 2263661   29188    3243  959948    3243   6486   3243
## 4    alabama,bibb 1451192   14629    2926 1009398       0   8777      0
## 5  alabama,blount  559433   72969    4230  506556    2115   5288   3173
## 6 alabama,bullock 2168413   56605       0 1184354       0   8708      0
##   bullshit  cock   crap crappy  cunt    damn damnit damned  darn   dick
## 1   120184 15897 146255  13354 22892 1206925  19077   8903 13990 210481
## 2    98452  7002 109910  10397  9124  907073   9760  10185 10185 113729
## 3    74591  3243 113507  19458  3243 1258310      0  25945 22701 136209
## 4   105328     0  90699   8777  2926 1176168  17555   2926 17555  93625
## 5   101523  9518 201988   6345  9518  469543  13748  23266  8460  59222
## 6   182878  8708  21771   4354     0 1240960      0      0  4354 300443
##   dickhead douche douchebag dumbass  dyke   fag faggot fatass freaking friggin
## 1     3179  14626      6359   43241  2544 42605  40697   6359   167876    2544
## 2     2971  18884      5729   29069  2546 19521  15489   4031   170593    2546
## 3        0   3243      3243   22701     0  9729      0      0   175126    3243
## 4     2926   5852         0   11703  2926  2926   8777      0   187251    8777
## 5     9518  13748      4230   25381     0 16920  32783   4230   195643    5288
## 6        0   4354         0   52251 17417 13063  52251      0    47897       0
##      fuck fucked fucker fuckery fucking goddamn   gosh   hell    hoe  homo
## 1 1441570 212388  21620    6359  592017    5087  69948 695667 268347 17169
## 2 1137714 139191  15065    2546  462767    5941  81690 573101 252920  7426
## 3 1115615 158910  12972    3243  376196    9729  64861 901573 369710  3243
## 4 1351715 236989  20481   17555  833850    8777  38035 506162 298431  2926
## 5  775168  88832  12690    3173  319374    4230 132191 379653 131134  1058
## 6 1941993 278672  21771    8708  574760   13063  30480 735867 335277     0
##   jackass motherfucker motherfucking nigger   piss pissed pissy  pussy    shit
## 1   13354        12082          3179   5087  69948 169148 11446 197763 2352169
## 2    4668         4456          2546   2971  71081 152770  4456 103969 1733094
## 3       0        19458          3243   3243  64861 139452  6486 204313 2085293
## 4       0         5852          5852      0  67293 201880     0 152141 2390371
## 5    2115         7403          3173      0 102580 155457 10575  32783  905244
## 6    4354        30480             0  13063  65314 117565  4354 296089 3239557
##   shittiest shitty  slut slutty twat whore                           geom
## 1      1908  52779 37518   3179 3179 46420 MULTIPOLYGON (((-86.50517 3...
## 2      3395  41163 31615   5305 3819 32676 MULTIPOLYGON (((-87.93757 3...
## 3      9729  22701 32431      0    0 25945 MULTIPOLYGON (((-85.42801 3...
## 4         0  20481 29258      0    0 29258 MULTIPOLYGON (((-87.02083 3...
## 5      2115  43359 35956   4230 2115 45474 MULTIPOLYGON (((-86.9578 33...
## 6      4354  65314 21771   4354    0 13063 MULTIPOLYGON (((-85.66866 3...
# If you open the data frame and scroll to the last column, you can see the
# list in the list. view(us_geo_swear)

Now that the data is prepared, we can try and map some swear words. Note that we’ve added geom_sf to the plot. We do this because it can handle the sf data we’ve added for the geolocation of our swear words. That also means we don’t need geom_polygon, but by the name you can tell it has similar functionality.

This first map is a very basic choropleth map based on our variable “ass”:

ggplot() + geom_sf(data = us_geo_swear, aes(fill = ass))

Let’s add our design to it:

ggplot() +
  geom_sf(data = us_geo_swear, 
          aes(fill = ass)) +
  theme_minimal() +  
  coord_sf(crs = "ESRI:102003") + # this sets the projection for the map, which is Albers
  ggtitle("'Ass' Distribution in the US per County") + 
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(), 
        axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5)) 

That looks sort of like what we want, so let’s rework it a bit. Note that we divide the occurrences of ‘ass’ by 10,000, since we’re dealing with high numbers we can thus make our graph easier to read this way. We had billions before, now we are dealing with hundred thousands. So instead of, for example, 49600 occurrences of asshole per billion words in Alabama, Autauga, we know use 4.96 occurrences per 100,000 words.

ggplot() +
  geom_sf(data = us_geo_swear, 
          aes(fill = ass / 10000), 
          lwd = 0.1, # lwd sets the outline thickness of the polygons
          color = "grey") + # this sets the outline colour
  theme_minimal() +  
  coord_sf(crs = "ESRI:102003") + 
  ggtitle("'Ass' Distribution in the US per County") +# this adds a new legend title with line break \n
  guides(fill = guide_legend(title = "Distribution")) + # here we start using some nicer colours
  scale_fill_continuous(low = "white", 
                        high = "mediumpurple4") + 
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(), 
        axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5),
        legend.title = element_text(size = 8))

We can see that there seems to be a trend towards ass in the Southeast. Let’s see if we can see some more trends.

ggplot() + geom_sf(data = us_geo_swear, aes(fill = dickhead/10000), lwd = 0.1, color = "grey") +
    theme_minimal() + coord_sf(crs = "ESRI:102003") + ggtitle("'Dickhead' Distribution in the US per County") +
    guides(fill = guide_legend(title = "Distribution")) + scale_fill_continuous(low = "white",
    high = "mediumpurple4") + theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
    axis.text.x = element_blank(), axis.text.y = element_blank(), panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5), legend.title = element_text(size = 8))

How about fuck, but in green?

ggplot() +
  geom_sf(data = us_geo_swear, 
          aes(fill = fuck / 10000), 
          lwd = 0.1, 
          color = "grey") + 
  theme_minimal() +  
  coord_sf(crs = "ESRI:102003") + 
  ggtitle("'Fuck' Distribution in the US per County") + 
  guides(fill = guide_legend(title = "Distribution")) + 
  scale_fill_continuous(low = "white", 
                        high = "aquamarine4") + # green this time?
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(), 
        axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5),
        legend.title= element_text(size = 8))

You can see that we are able to produce nice looking maps, but we can’t really see the distribution of the feature well. So we introduce a concept called class intervals, in our case quantiles.

Quantiles

In the next step for the swearing maps we’ll implement quantiles. What that means is we split the relative frequency distribution for the word we want to map into intervals. We’re using “quantile” style intervals here, where the values are split so each interval contains a roughly equal number of values, although the range of each interval will likely vary (often considerably).

In order to do this we’ll first pick a swear word, it’s location and create a new list. Then we’ll calculate the quantiles for our swear word and add this as a factor to our list. Exchange the swear word in this code to run it with a different one.

# select the columns you need
quant_swear <- us_geo_swear %>%
    select(fuck, geom)
# calculate quantiles
q <- quantile(quant_swear$fuck, na.rm = TRUE)
# add factor given the quantiles to our list
quant_swear$quant <- factor(findInterval(quant_swear$fuck, q))

Now we can map our data. Instead of filling the polygons by the frequency of our swear word, we use the quantiles we’ve just defined. Note that that means we’re going from continuous scale colours to discrete, so we need to change the colouring option of our map. That’s why we first define these colours.

cols <- c("1" = "white", 
          "2" = "lightsteelblue1", 
          "3" = "lightsteelblue2", 
          "4" = "lightsteelblue3", 
          "5" = "lightsteelblue4")
ggplot() +
  # we've added na.omit to not have NAs plotted 
  geom_sf(data = na.omit(quant_swear), 
          aes(fill = quant), 
          lwd = 0.1, 
          color = "grey") + 
  # here we pass our colour list
  scale_colour_manual(values = cols, 
                      #and say we use it to fill
                      aesthetics = c("colour", "fill")) + 
  theme_minimal() +  
  coord_sf(crs = "ESRI:102003") + 
  ggtitle("'Fuck' Quantile Distribution in the US") + 
  guides(fill = guide_legend(title = "Quantiles")) + 
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(), 
        axis.text.y = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        plot.title = element_text(hjust = 0.5),
        legend.title = element_text(size = 8))

Let’s map the quantiles of another swear word and change the colours for the map. If you want to play around with colour yourself, this website offers a good overview.

quant_swear <- us_geo_swear %>%
    select(shit, geom)
q <- quantile(quant_swear$shit, na.rm = TRUE)
quant_swear$quant <- factor(findInterval(quant_swear$shit, q))
cols <- c(`1` = "white", `2` = "rosybrown1", `3` = "rosybrown2", `4` = "rosybrown3",
    `5` = "rosybrown4")
ggplot() + geom_sf(data = na.omit(quant_swear), aes(fill = quant), lwd = 0.1, color = "grey") +
    scale_colour_manual(values = cols, aesthetics = c("colour", "fill")) + theme_minimal() +
    coord_sf(crs = "ESRI:102003") + ggtitle("'Shit' Quantile Distribution in the US") +
    guides(fill = guide_legend(title = "Quantiles")) + theme(axis.title.x = element_blank(),
    axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(),
    panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5),
    legend.title = element_text(size = 8))

If we still have time, we will create a different looking map and add some cities, so you can see a different example of making maps.

Adding cities

As the last bit, we’ll try out adding another layer to our ggplot maps. Remember our map for the German-speaking area.

gsa

If we wanted to add cities to this, because we’re interested in looking at a city level population, we can do this by using geom_point. Let’s first load some data to do this. The data here is a tiny portion of the data used by Hovy & Purschke in their 2018 paper.

gsa_data <- read.table("https://raw.githubusercontent.com/danaroemling/mapping/main/r_ladies_april23/MAPPING_DIALECT.csv",
    header = TRUE, sep = ",")

Note that we have a data set which contains both the linguistic information (here the counts and proportion) and the geolocation information. With this, we can map the data using the cities.

First, we again use our coordinates to create the basic map of the GSA, just as we did before. Only in the geom_point layer do we add the city data.

gsa + 
  theme_minimal() +  
  geom_point(data = gsa_data, # here we add the cities to our map
             aes(x = Long, y = Lat, col = Proportion, size = (Count1+Count2)), 
             alpha = 0.9)  +
  guides(size = "none") +
  scale_color_gradient(low = "seagreen3", high = "mediumpurple3") +
  ggtitle("Schau vs Guck in the GSA") +
  theme(axis.title.x = element_blank(), 
        axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        panel.grid.major = element_blank(),
        plot.title = element_text(hjust = 0.5))

What this map shows us is the proportion of usage of the two feature in the given cities. We can see that one feature is more prevalent in the north and one in the south, so with our map we can easily visualise the distribution of this linguistic variable - much easier to understand than the table we have as gsa_data.

Saving your output

As the last step we want to save our map.

ggsave("german_map.png", width = 6.5, height = 5.5)